tight analysis
Reviews: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
The paper investigates the problem of expressiveness in neural networks w.r.t. The authors also show an upper bound for classification, a corollary of which is that a three hidden layer network with hidden layers of sized 2k-2k-4k can perfectly classify ImageNet. Moreover, they show that if the overall sum of hidden nodes in a ResNet is of order N/d_x, where d_x is the input dimension then again the network can perfectly realize the data. Lastly, an analysis is given showing batch SGD that is initialized close to a global minimum will come close to a point with value significantly smaller than the loss in the initialization (though a convergence guarantee could not be given). The paper is clear and easy to follow for the most part, and conveys a feeling that the authors did their best to make the analysis as thorough and exhausting as possible, providing results for various settings.
Tight Analysis of Extra-gradient and Optimistic Gradient Methods For Nonconvex Minimax Problems
Despite the established convergence theory of Optimistic Gradient Descent Ascent (OGDA) and Extragradient (EG) methods for the convex-concave minimax problems, little is known about the theoretical guarantees of these methods in nonconvex settings. To bridge this gap, for the first time, this paper establishes the convergence of OGDA and EG methods under the nonconvex-strongly-concave (NC-SC) and nonconvex-concave (NC-C) settings by providing a unified analysis through the lens of single-call extra-gradient methods. We further establish lower bounds on the convergence of GDA/OGDA/EG, shedding light on the tightness of our analysis. We also conduct experiments supporting our theoretical results. We believe our results will advance the theoretical understanding of OGDA and EG methods for solving complicated nonconvex minimax real-world problems, e.g., Generative Adversarial Networks (GANs) or robust neural networks training.
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require N hidden nodes to memorize/interpolate arbitrary N data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with \Omega(\sqrt{N}) hidden nodes can perfectly memorize most datasets with N points. We also prove that width \Theta(\sqrt{N}) is necessary and sufficient for memorizing N data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an L -layer network with W parameters in the hidden layers can memorize N data points if W \Omega(N) .
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
Yun, Chulhee, Sra, Suvrit, Jadbabaie, Ali
We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require $N$ hidden nodes to memorize/interpolate arbitrary $N$ data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with $\Omega(\sqrt{N})$ hidden nodes can perfectly memorize most datasets with $N$ points. We also prove that width $\Theta(\sqrt{N})$ is necessary and sufficient for memorizing $N$ data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an $L$-layer network with $W$ parameters in the hidden layers can memorize $N$ data points if $W \Omega(N)$.